-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3812] [BUILD] Adapt maven build to publish effective pom. #2673
Conversation
QA tests have started for PR 2673 at commit
|
@pwendell Take a look, whenever you get time. It would be good if we can publish https://github.com/ScrapCodes/effective-pom-plugin. |
I will have to add a similar thing for http://maven.apache.org/plugins/maven-deploy-plugin/deploy-file-mojo.html. But I am not sure about repository url field. |
QA tests have finished for PR 2673 at commit
|
Test PASSed. |
@ScrapCodes what's the end goal here? I understand what your change does, but I'm confused about how this actually translates to the published bits, unless there's more information that I'm missing. AFAIK, right now, only one "spark-core_2.10" artifact is published, as can bee seen at: http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 So, with this fix, the default hadoop version for that artifact would be changed if the build that publishes it is compiled with a different Or is there a plan to publish different hadoop versions under different artifactIds (or versions)? |
Well, sorry for the confusion. For the published bits used now one can easily exclude and include another dependency as long as it is about a few deps. But once we start moving towards supporting scala 2.11 we would want to publish artifacts for both scala 2.10 and 2.11. At that point publishing effective pom(s) will be essential. There is a related discussion on another PR #2615. |
Hey @vanzin this is something we need to support cross building in 2.10 and 2.11, but it's also a broadly useful thing that is lacking in the maven build ecosystem (SBT has this feature out of the box). The reason why it's needed is that we need to publish different dependency trees for every artifact, say The reason why it's useful more broadly is that we have a lot of profiles in Spark's build that get published with our poms, even though these are not usable by projects that depend on Spark (profiles are not a concept that works transitively across projects - or at least many build tools such as sbt do not respect profiles from other projects). They just make our poms more confusing looking for people. What we really want to publish is a single set of dependencies for each Spark artifact. |
Thanks for the explanation. The issue with the Scala versions makes sense. What threw me off was the Hadoop example: I've always seen people say that the Spark API is independent of the Hadoop version, and they should explicitly say the Hadoop version they want in their projects (and have a matching Spark deployment). So explaining this PR in terms of publishing a different Hadoop versions sounds a little bit at odds with that. |
Hi @pwendell, I had a similar issue related to artifacts in Maven Central and Hadoop versions. |
@vanzin yeah I agree - I think it was just sort of a red herring based on the example in @ScrapCodes description. AFAIK this is totally unrelated to Hadoop. |
Can one of the admins verify this patch? |
@pwendell As a fun-fact, hbase publishes jars for hadoop1 and hadoop2 like so. http://search.maven.org/#artifactdetails%7Corg.apache.hbase%7Chbase%7C0.98.6.1-hadoop2%7Cpom they have some tooling in |
QA tests have started for PR 2673 at commit
|
QA tests have finished for PR 2673 at commit
|
Test PASSed. |
@pwendell I tried maven shade plugin to somehow work as effective pom generator, but that does not happen unless we have dependencies apart from the project's itself to put in uber jar. Please see the above commit and their commit messages for reasons. |
QA tests have started for PR 2673 at commit
|
QA tests have finished for PR 2673 at commit
|
Test PASSed. |
@pwendell I don't see an easy way with maven shade plugin either ? Do you ?, One way is to include a fake dependency and then ask it to shade that across all artifacts. But I somehow felt this is more invasive. |
What if we just add the maven-shade plug-in but we configure it in a way where it effectively is doing no operations? I.e. we add a relocation of a class that does not exist. Will it still produce an effective pom? |
I think that a solution that doesn't require us using our own plug-in is good, even if we do something silly like we have to shade a class that is in fact un-used. |
Even if we do this trick and shade something in all artifacts, what about spark-parent ?(I mean how to get its effective pom.) There since we don't build jar - shading plugin throws NPEs. |
Hm - is there any workaround to that? We don't actually need to publish the parent pom anyways if we are using effective poms. Could we just install the other modules only and not the parent? The current approach also adds Guava logic to the parent pom which doesn't make much sense. I think ideally we'd just have a very simple "dummy" relocation (e.g. without any exclusions) in the parent and the existing shading stuff around guava would be in |
We can definitely install other modules, I am afraid if the resultant(effective) pom(s) will carry reference to parent pom. Let me try that out. |
242a24e
to
e846069
Compare
Yeah I verified the resultant dependency tree as shown in gists above. |
Jenkins, test this please. |
QA tests have started for PR 2673 at commit
|
Tests timed out for PR 2673 at commit |
Test FAILed. |
Hey @ScrapCodes @pwendell, Funny thing. I've been playing with a different part of the build and I've found out I need exactly this to get what I wanted to work. I think I found a workaround that doesn't require a new plugin, although it's not pretty. What you can do is create a "dummy" sub-module that just publishes an empty jar (or a jar with ay random file, it really doesn't matter). In the projects where you want to publish the effective pom, add a call to the maven-shade-plugin that only shades this dummy dependency, and actually excludes all files from it in the That has both the effect of not packaging anything else in the final jar, and also removing the dummy dependency from the published pom. But well, it's pretty ugly. |
Yay, I just noticed your last patch takes exactly that approach. |
@vanzin yep that's exactly what we did! |
Jenkins, test this please. |
QA tests have started for PR 2673 at commit
|
Tests timed out for PR 2673 at commit |
Test FAILed. |
6fff1ce
to
d7cf5db
Compare
Not sure which is better ! Creating an empty project with just a pom file in it. Or depending on random jar from maven central. ? I prefer first approach. |
I'll just publish an empty jar in maven central - we can use it forever. |
I created a jar you can use. It might take a couple of hours to propagate to maven central:
|
d7cf5db
to
aa7b91d
Compare
@ScrapCodes here's an approach that doesn't rely on maven central: Feel free to use / adapt it if you guys think that's preferred. (Also not that this solution runs the maven-shade-plugin even for modules that are not "jar" modules; it seems to work and not do anything evil, though.) |
Jenkins, test this please. |
QA tests have started for PR 2673 at commit
|
QA tests have finished for PR 2673 at commit
|
Test PASSed. |
Okay - both approach seem decent but the current one is a bit simpler for developers (otherwise we have to explain this |
@ScrapCodes @pwendell
|
Thanks - I reverted this patch in master. On Thu, Oct 23, 2014 at 2:39 AM, Guoqiang Li [email protected]
|
I have tried maven help plugin first but that published all projects in top level pom. So I was left with no choice but to roll my own trivial plugin. This patch basically installs an effective pom after maven install is finished.
The problem it fixes is described as follows:
If you install using maven
mvn install -DskipTests -Dhadoop.version=2.2.0 -Phadoop-2.2
Then without this patch the published pom(s) will have hadoop version as 1.0.4. This can be a problem at some point.